Analytical Review of the News Data Classification Methods with Multivariate Classification Attributes

نویسنده

  • Mandeep Kaur
چکیده

-The new classification has been emerged as the important sub-branch of the data mining. A lot of work has been already done on the news classification with variety of classifiers and feature descriptors. A number of news classification projects are working on the real-time systems in existence today. The news classification is the important part of the online news portals. The online news portals are rising every year, and adding more users to the news portals. The news classification is the branch of text classification or text mining. The researchers have already done a lot of work on the text classification models with different approaches. The news works has to be classified in the form of various categories such as sports, political, technology, business, science, health, regional and many other similar categories. The researchers have already worked with many supervised and unsupervised methods for the purpose of news classification. The supervised models have been found more efficient for the purpose of news classification. The major goal of the news classification research is to improve the accuracy while decreasing the elapsed time. Our news classification models purposes the use of k-means and lexicon analysis of the news data with nearest neighbor algorithm for the news classification. The k-means algorithm is the clustering algorithm and used primarily to produce the text data clusters with the important information. Then the lexicon analysis would be performed over the given text data and then final classification of the news is done using k-nearest neighbor. The results would be obtained in the form of the parameters of accuracy, elapsed time, etc.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Support Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran

Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016